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Abstract 

Signalling off-chip requires significant current. As a result, a chip's power-supply current 
changes drastically during certain output-bus transitions. These current fluctuations cause a 
voltage drop between the chip and circuit board due to the parasitic inductance of the power- 
supply package leads. Digital designers often go to great lengths to reduce this transmitted 
noise. Cray, for instance, carefully balances output signals using a technique called differential 
signalling to guarantee a chip has constant output current. Transmitted-noise reduction costs 
Cray a factor of two in output pins and wires. Coding achieves similar results at smaller costs. 

In a circuit using parallel-terminated transmission lines such as the Cray-1, a chip exhibits 
nearly constant output current throughout a single clock cycle. However, since current for an 
output 1 differs greatly from that for an output 0, output current can change drastically at the 
end of a cycle generating an intolerable amount of transmitted noise. Hence, a coding scheme 
which maintains a constant number of output l's will reduce transmitted noise as successfully 
as differential signalling. Experimental results substantiate this. 

Information capacity of a word is greatest when the word has half l's. Coding requires 
fewer output signals than differential signalling. For example, an 8-bit byte can be encoded 
as eleven bits with four l's at an output savings of 30%. Of course, fewer outputs do not 
come for free; coding costs time and chip area. Coding achieves a smaller percentage output 
expansion for words longer than eight bits but requires a more costly coding scheme. 

In a circuit using series-terminated or un-terminated transmission lines, output current 
flows when a signal changes state to charge/discharge the capacitive load. Hence, reducing 
transmitted noise requires minimizing output transitions. Representing transitions as l's, a 
coding strategy which minimizes l's will reduce transmitted noise. Unfortunately, words with 
few l's have little information capacity. Coding in this case requires an exponential increase 
in output signals. Hence, only a 2x to 3x noise reduction is practical. 
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Chapter 1 



Introduction 



Noise hampers the performance of digital circuits by corrupting otherwise valid logic signals. 
For the purposes of this document, two types of noise exist: internal (or self) noise and 
transmitted noise. As the names suggest, the former refers to noise confined to a single 
integrated circuit (or chip) whereas the latter refers to noise in the communication between 
chips [Dil88]. This document discusses techniques for reducing a common form of transmitted 



noise. 1 



1.1 Transmitted Noise in Digital Circuits 

In digital circuits chips communicate with signals at one of two voltage levels. A chip's power 
supplies act as sources of transmitted signals and references for received signals. Chip-to-chip 
communication can be successful only if each chip operates with the same source and reference 
voltages. Transmitted noise can cause power-supply voltages from chip to chip to disagree 
and, hence, contribute to miscommunication. 

Due to V = L(dl/dt) and the parasitic inductance of package leads, a voltage drop occurs 
between chip and circuit board during a power-supply current fluctuation. Since all chips in 
a circuit will not suffer from the same voltage drop, this is a form of transmitted noise. My 
thesis attempts to alleviate this problem. 



'Since the cause of the transmitted noise can be identified and removed, theorists may dispute the use of 
the word noise in this case. However, this is noise from a digital designer's perspective and is referred to as 
such in electronics literature. 



Since signalling off-chip requires significant current, a chip's output drivers can generate a 
prohibitive amount of transmitted noise. For this reason ECL chip manufacturers provide a 
separate ground pin, often labeled Vcco, for the output drivers as an attempt to isolate them 
from the internal logic circuits. "When switching all the outputs on a Vcco P m except one, 
a noise pulse of about 2ns wide is coupled to the non-switched outputs due to cross-talk and 
lead inductance." [PB88] Digital designers often go to great lengths to reduce transmitted 
noise. Cray, for instance, carefully balances output signals using a technique called differential 
signalling to guarantee a chip has constant output current. Transmitted noise reduction costs 
Cray a factor of two in output pins and wires [K0I8I]. Coding achieves similar results at 
smaller costs. 

Digital designers employ many techniques to reduce transmitted noise. These techniques 
fall in two categories based on V = L(dl/dt): reducing L and reducing dl/dt. The parasitic 
inductance of package leads is approximately equal to l.OnH/mm. (This value can be derived 
from basic principles.) Although this seems small, since current through power and ground 
pins on parts can reach ampere levels, the package lead inductance is a significant noise source 
[TK85]. The magnitude of L can be reduced in two ways. ECL chip designers shorten power- 
supply package leads by moving them from their traditional TTL locations (top, right and 
lower, left) to the center of the package [BI088]. Adding power-supply pins also reduces L since 
adding inductors in parallel reduces total inductance. Most large chips employ this technique. 
The intel i860 microprocessor, for instance, has twenty-four ground pins and twenty-four power 
pins [Int89]. 

Digital designers achieve a reduction in dl/dt in several ways. Intel uses weak output 
drivers in its i860 to slow output-signal edges and, hence, reduce dl/dt [Int89]. The Motorola 
ECL gate array guidelines suggest using the output edge rate slow down option to reduce 
the noise generated by the output drivers. "This option adds 500ps of delay to the output 
because it slows the output signal edge rates; however, it also slows the rate of change of 
current pulled through the Vcco pins, which cuts the maximum noise amplitude by 50mV." 
[PB88] The staggering of output signals and the slowing of clock edges also reduces dl/dt. 




Figure 1-1: For circuits signalling with parallel-terminated transmission lines, a coding strat- 
egy which maintains a constant number of output l's reduces transmitted noise. 

1.2 Coding to Reduce Transmitted Noise 

This document presents techniques for reducing power-supply lead dl/dt through the coding 
of output signals. For the purposes of this document, two types of digital circuits exist: circuit 
where chips signal with parallel- terminated transmission lines and circuits where chips signal 
with either series-terminated or un-terminated transmission lines. A different noise-reduction 
coding strategy will be developed for each case. 

In a circuit using parallel- terminated transmission lines such as the Cray-1 [K0I8I], a 
chip exhibits nearly constant output current throughout a single clock cycle. However, since 
current for an output 1 differs greatly from that for an output 0, output current can change 
drastically at the end of a cycle generating an intolerable amount of transmitted noise. Hence, 
a coding strategy which maintains a constant number of output l's will reduce transmitted 
noise as successfully as differential signalling. My experimental results substantiate this. 

In a circuit using series-terminated or un-terminated transmission lines, output current 
flows when a signal changes state to charge/discharge the capacitive load. Hence, reducing 
transmitted noise requires minimizing output transitions. Representing transitions as l's, a 
coding strategy which minimizes l's will reduce transmitted noise. 




Figure 1-2: For circuits signalling with series-terminated of un-terminated transmission lines, 
a coding strategy which minimizes l's reduces transmitted noise. 

1.3 Implications of Coding 



Coding to reduce transmitted noise does not come without costs. Coding requires design 
time, execution time, chip area, and output pins. This document discusses general coding 
approaches but does not recommend an all-purpose coding strategy. Therefore, a considerable 
engineering effort will be required to customize the general ideas presented here to a specific 
application. The other costs can be discussed more concretely. Time is required to encode 
the output signals prior to transmission and decode the input signals after reception. In a 
pipeline environment, however, the cost of an additional stage or two should not be too severe. 
Encoding and decoding circuitry occupy chip area. This cost will be examined in depth, later, 
when discussing specific coding schemes. 

Coding requires additional output pins. As an example, consider a microprocessor chip 
with an iV-bit output bus. The chip suffers from transmitted noise, and we have decided 
to use coding to reduce the generation of noise. Coding restricts the chip's output signals. 
Information theory dictates that, because of this restriction, more than N signals are required 
to transmit N bits of information. The reasoning is as follows. 

Define p as the probability of an output signal being a 1. On the average, the amount of 
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Figure 1-3: The information capacity of 32-bit words plotted as a function of the number of 
l's in the word. 

information (in bits) transmitted by each signal is H(p), where H(p) = —p*log2{p) — (1 — p)* 
/o<72(l — p)- The amount of information transmitted by the entire word is therefore N * H(p). 
Prior to coding, we have output signals which are equally likely to be 1 or 0, so p — 1/2. 
H(p= 1/2) = 1, so one bit of information is transmitted per signal and N bits for the entire 
word. Coding places a restriction on usable output words, so p is no longer equal to 1/2. This 
forces H(p), the average amount of information per signal, to be less than one bit. Therefore, 
more than JV signals are required to transmit N bits of information [Hum85]. 

What does all this mean for the two types of coding discussed above? The entropy (or 
information capacity) plot for a 32-bit word shown in Figure 1-3 will help explain. The plot 
shows good news for the parallel- terminated case and bad news for the series-terminated/un- 
terminated case. The information capacity of a word is greatest where the word has half 
l's. Therefore, for the parallel- terminated case, to maintain a constant number of output l's, 
choose the constant to be approximately half the word length. Since so many words of this 
type exist, few extra output signals are necessary. For example, an 8-bit byte can be encoded 



as eleven bits with four l's. This achieves the noise reduction of differential signalling at an 
output savings of more than 30%. Coding achieves a smaller percentage output expansion for 
words longer than eight bits. For example, a 32-bit word can be encoded as thirty-five bits 
with seventeen l's. Unfortunately, longer words require a more costly coding scheme. 

For the series-terminated/un-terminated case minimizing l's cannot be as successful since 
such a small number of words with few l's exists. Here, coding requires an exponential increase 
in output signals. Hence, only a 2x to 3x noise reduction is practical. 

No noise-reduction scheme is without its sacrifices. (The only alternative to reducing noise 
is to slow a circuit's clock and live with it. Of course, this represents a substantial performance 
loss.) The techniques discussed for reducing L restrict package design and increase the pinout 
of a chip. The techniques discussed for reducing dl/dt hurt a chip's performance by by 
reducing its output bandwidth. Coding requires time, chip area, and additional output pins 
but allows a designer to trade-off these characteristics to suit an application. 

The next chapter includes preliminary details not appropriate for an introductory chapter. 
It explains in more detail some of the ideas already presented as well as introduces several 
noise-reduction coding terms coined to make the remainder of the document more readable. 



Chapter 2 



Explanation of Terminology 



This document discusses the development of totally new coding techniques. Scientific lit- 
erature contains little investigation of low weight and constant weight coding and certainly 
nothing directly applicable to this problem. For this reason coding terms have been coined to 
ease the reading of this document. 

Coding's primary goal here is to reduce current fluctuations on a chip's power-supply pins. 
As mentioned previously the cause of these fluctuations fall in one of two categories, depending 
on a circuit's chip-to-chip signalling technology. The next section of this document examines 
this distinction. Later, separate noise-reduction coding schemes will be developed for each 
category. 

2.1 Segregating Digital Systems for Coding 

The coding strategy depends on the interchip communication technology of a digital circuit. 
Three communication technologies will be discussed in this document: parallel-terminated, 
series- terminated, and un-terminated transmission lines. (Transmission line refers to any 
type of wire connecting two chips, including coaxial cable, PC-board trace, and wire-wrap 
wire.) Figures 2-3 through 2-5 depict the three technologies schematically. Each contains a 
transmitting-chip output driver, a transmission line, and a capacitive load. Cload represents 
the capacitance of the interconnect as well as that of the receiving-chip input. The output 
driver and capacitive load are identical in all three cases. 
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Figure 2-1: A model for signalling with a parallel-terminated transmission line. 



This document categorizes circuits according to interchip, steady-state current flow. Cir- 
cuits using parallel-terminated transmission lines form one group- those that exhibit steady- 
state current. When signalling over parallel-terminated lines, a path of low resistance exists 
in the steady state between a transmitting-chip power supply and that of the terminating 
voltage. Referring to Figure 2-3, when Vout changes logic value, current flows to/from the 
output driver to charge/discharge Cload- Even after Cload is fully charged/discharged, 
current continues to flow because of Rj. The terminating voltage (Vtt) is n °t equal to either 
logic-value voltage. Hence, steady-state current flows between the output driver and Vtt- 

Circuits using series- terminated or un-terminated transmission lines form the second group- 
those that exhibit negligible steady-state current. Minimal steady-state current flows in such 
circuits because of the high input resistance of receiving gates. A large current flows when an 
output switches logic value, but this current decays to near zero in the steady state. Referring 
to Figures 2-4 and 2-5, when Vout switches, current flows to/from the output driver charg- 
ing/discharging Cload- Once Cload is fully charged/discharged, however, current virtually 
ceases. There is no resistive element to sustain steady-state current. The input resistance of 
a receiving gate is so large when compared with Rt that it is effectively infinite and has been 
ignored in the figures. 

The coding strategy is independent of the circuit technology of a digital system. Reference 
to circuit technology (e.g., ECL, TTL, or CMOS) is unnecessary since it is not a determining 
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Figure 2-2: A model for signalling with an un-terminated transmission line. 
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Figure 2-3: A model for signalling with a series-terminated transmission line. 



factor as far as coding to reduce noise is concerned. 

ECL is a popular, high-performance circuit technology. Chips in ECL circuits are com- 
monly connected with parallel- terminated transmission lines. These circuits fall in the cate- 
gory of systems which exhibit steady-state current in their interconnection network and are a 
primary motivation of this document. It is important to understand, however, that a CMOS 
circuit using parallel-terminated transmission lines falls in the same category. An often pro- 
claimed virtue of CMOS circuits is that they dissipate no static power. However, if CMOS 
chips are connected with parallel-terminated lines, then static power will be dissipated in the 
interconnection network. 

CMOS is the dominant, high-performance circuit technology of our time. Chips in a 
CMOS circuit are often connected with un-terminated transmission lines. These circuits fall in 
the category of systems which exhibit negligible steady-state current in their interconnection 
networks and are another primary motivation of this document. Remember, chips of any 
circuit technology connected with series-terminated or un-terminated transmission lines fall 
in this category as well. 

2.2 A Bit of Coding Jargon 

Coding accomplishes noise reduction by limiting the variability of successive transmitted 
words. This in turn reduces the current variability on the transmitting chip's power-supply 
pins, the fundamental cause of the noise. Coding reduces the variability of words by restrict- 
ing the number of l's in the word. Coding and information theory define this number of l's 
as the weight of the word. Because of this definition and to ease the reading of this paper, 
three weight-based, noise-reduction coding terms have been coined. 

iV-bit words of unrestricted weight have maximum weight N and minimum weight zero. 
Coding so as to reduce the maximum weight will be called starvation coding, specifically, 
starting with words of unrestricted weight, coding so as to limit words to weight J or below 
will be referred to as starving the words to weight J. In contrast, coding so as to increase the 
minimum weight will be called indulgence coding. Limiting words to weight K or above will 
be referred to as indulging the words to weight K . 

Another useful technique involves coding so as to keep the weight at a constant or nearly 

10 




Figure 2-4: Coding and information theory's concept of weight leads to three coding terms. 




Figure 2-5: Starvation coding reduces a word's weight. 
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Figure 2-6: Indulgence coding increases a word's weight. 




Figure 2-7: Ration coding keeps a word's weight constant. 
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constant level. This type of coding will be called ration coding. Limiting the weights of words 
to the range J to K will be referred to as rationing the words to weights J through K. Keep 
it in mind that all three techniques involve discarding coded words which lie outside a desired 
range. As a result the coded words require a greater number of bits to convey the same 
amount of information as their uncoded predecessors. 

Throughout this document I will use the word bit in two ways, both as a digit in the 
number base 2 (often synonymous with the word signal or wire) and as a unit of information. 
I will use the phrase bit of information when referring to the latter definition and there could 
be confusion [Ham80]. Also, I will use N for the number of bits in the output word prior to 
coding and M for the expanded number of bits after coding (hence, M > N). 
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Chapter 3 

Reducing Transmitted Noise Using 
Ration Coding 



This chapter is devoted to coding to reduce transmitted noise in circuits exhibiting steady- 
state current in their interconnection networks. The conclusions reached apply to all such 
circuits. Because of its popularity, this document will often refer to a particular technology, 
ECL chips communicating with parallel- terminated transmission lines, though the results are 
more generally applicable. 

3.1 An Example of Transmitted Noise in an ECL Circuit 

Bipolar Integrated Technology's B3011 16x16 multiplier-accumulator is a 132-pin ECL chip 
capable of driving forty signals off chip simultaneously [Bip86]. The wide output bus makes 
the B3011 susceptible to generating transmitted noise. The output, emitter-follower circuits 
are shown schematically in Figure 3-1. Because all ECL chips have similar output circuitry, 
the problem described here pertains in varying degree to the entire ECL family. For this 
example, the chip is driving 50ft transmission lines terminated in parallel to Vtt = —2. OF. 
Furthermore, assume there is just one output ground pin. This pin is labeled Vcc2 with pin 
and bonding wire inductance L as shown. 

Consider a transmitted-noise, worst-case scenario; all forty outputs are making low to 
high transitions. These transitions cause a large change in current on the Vcc2 pin- With 
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Figure 3-1: 40-bit parallel output circuit of the B3011 driving parallel-terminated transmission 
lines. Figure assumes one output ground pin (Vcc2)- 
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Voh = -0.9V, Vol = -1.75V, and a transition time Tlh = 1.5ns [BI088], the output current 
rate of change, dlour/dt, can be approximated as follows: 

/, = 2^2v=«^ (3-D 



(2.0-0.90)V , , N 

lH = 50ft = 22 -° mA ^ 

d Jw Z ^In_ lL= (22.0 - 5 y = x 7 

<ft T L // 1.5 x 10~ 9 5 ' v ; 

There are forty outputs changing, hence, the Vcc2 current rate of change, dlcc2/dt, can be 
approximated as follows: 

d j££l « 40(^-^) = 4.53 x 10 8 A/, (3.4) 

at l LH 

According to V = L(dl/dt) this change in current will cause a voltage drop across the 
pin and bonding wire inductance. Package wiring has an inductance of roughly lO.OnH/cm. 
Assuming the pin and bonding wire together measure 1.0cm, the voltage drop (Vnoise) can 
be approximated as follows: 

Vnoise = H^P-) ~ (10.0nJJ)(4.53 x 10 8 A/s) = 4.6V (3.5) 

This amount of noise forces the emitter-follower circuits completely out of their intended 
region of operation. Let's examine one emitter-follower output circuit with a more instructive 
amount of noise (say, Vnoise = 1.0V). Figure 3-2 depicts the transistor model used in 
this analysis [SS88, BI088]. Figure 3-3 shows the resulting transfer curve, with noise (dashed 
lines) and without noise (solid line). Considering first the circuit when there is no noise, notice 
that the transistor is well within the linear region when signalling high and low logic values. 
However, with 1.0V of noise, the transistor saturates before the output can reach a valid logic 
1 level. One must wait for the noise to go away before a valid 1 can be signalled. 

BIT handles this problem by including six Vcc2 pins on the chip [Bip86]. This reduces 
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Figure 3-2: Transistor model used in emitter-follower analysis. 

the worst-case noise by about a factor of six (from 2.3V to about 0.4F). With 0.4V of noise, 
the emitter-follower transistor barely remains in the linear region when signalling a logic 1 
(see Figure 3-3 again). With noise-reduction coding and less than six additional pins, it is 
possible to virtually eliminate the worst-case noise. 

3.2 Ration Coding as a Solution 

A type of noise-reduction coding, termed ration coding earlier, will nearly eliminate the noise 
in the example of the previous section. From equation 3.5 it is apparent that no noise will be 
generated if the current on the Vcci pin does not change. The Vcci P m carries the current 
for all forty output signals. In order to keep the Vcci current constant, the sum of the forty 
output currents must remain the same. The outputs need not remain static, but the total 
demand from the group must not change. 

Notice from equations 3.1 and 3.2 that more than four times as much current flows when 
signalling a 1 as when signalling a 0. A large change in current results from a transition such 
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Figure 3-3: Emitter-follower transfer curve with noise (dashed lines) and without noise (solid 
line). 
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as WORD(t) = 0000.. .00 to WORD(t + 1) = 1111. ..11, whereas, a negligible change results 
from a transition such as WORD(t) = 0101. ..01 to WORD(t + 1) = 1010. ..10. Therefore, 
if for every high-to-low transition there is a corresponding low-to-high transition, then the 
Vcc2 current will be nearly constant, and negligible noise will be generated. Another way 
to view this is if the chip is signalling the same number of l's from cycle to cycle (this, of 
course, guarantees signalling the same number of 0's as well) then negligible noise will be 
generated. This can be achieved with ration coding. (The Vcci current required to charge 
the receiving-gate input and interconnect capacitances, Cload from section 2.1, cannot be 
simply re-directed from one output to another. For this reason, even when signalling the same 
number of l's from cycle to cycle, small changes in current will occur.) 

A technique called differential signalling was employed to nearly eliminate transmitted 
noise in the Cray-1 computer system [Kol81]. Here, a signal and its inverse transmit one bit 
of information. Differential signalling reduces transmitted noise to its limit since it guarantees 
a constant output current for each information bit but requires a 2x increase in output pins 
and wires. Ration coding achieves similar results at smaller pin count. 

3.3 Theoretic Limits of Ration Coding 

Ration coding was defined earlier as a mapping from words with unrestricted weight (recall 
weight equals the number of l's in a word) to words with constant or nearly constant weight. 
From a power conservation point of view, since in the ECL technology signalling a 1 requires 
more current than signalling a 0, it is desirable to make the constant weight a small number. 
Unfortunately, most of the information-carrying capacity of an Af-bit word is contributed by 
words with weight near M/2. This fact will be demonstrated shortly. 

It was argued in section 1.1.4.1. that noise-reduction coding required the expansion of the 
output-word width (from N bits to M bits). Since a total of M - N additional output pins 
are required, it is desirable to restrict the magnitude of M. Throughout this document the 
notation C(M, W) will be used to represent the number of different M-bit words with weight 
W. Therefore, 
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Figure 3-4: The number of different, constant- weight, 35-bit words plotted as a function of 
weight, C(35,W). 

[Dra67] Figure 3-4 plots this relationship as a function of W with M = 35. Notice that most 
of the information-carrying capacity resides near W = 35/2. 

Figure 3-4 should give the reader a qualitative understanding of how ration coding works. 
Starting with an iV-bit word with unrestricted weight, ration coding expands the word to M 
bits while restricting the weight to near M/2. Of course, ration coding will most successfully 
reduce transmitted noise when words are restricted to a single weight. Ignoring implementa- 
tion details for the moment, what is the minimum number of extra bits (define E = M — N) 
needed when rationing an ./V-bit word to a single weight near M/2 1 

The total number of different iV-bit words is equal to 2^. In order to encode each of these 
iV-bit words as a different M-bit, constant-weight word, enough such words must exist. This 
is to say, M-b\t words with constant weight M/2 will have the information-carrying capacity 
to replace iV-bit uncoded words, only if the following holds: 



C(M,M/2)> 2 



N 



(3.7) 
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or, using equation 3.6, 



Using Stirling's approximation, 



Ml 



(M/2)!(M/2)! 



>2 



N 



(3.8) 



Ml « V2^M M+ We- M 



(3.9) 



an expression for the additional bits required by the encoded word, E, can be derived. (Stir- 
ling's approximation improves as M gets larger. It is within 1% for M = 9 [Ham80].) 
Starting with equation 3.8, Stirling's approximation gives 



simplifying, 



2lr" M M+(i/2) e -M 



27r((M/2)( M / 2 )+ 1 /2 e -M/2)2 
1 mM+(i/2) 



> 2 



N 



( V / 2^ ) W2) M+1 



>2 



N 



(3.10) 



(3.11) 



1 2 M+1 M M+ ( 1 / 2 ) 



( v / 2lr" ) M M + ! 



> 2 



AT 



(3.12) 



and, since M = 2 l ° 92M , 



further simplifying, 



1 2 M+1 
V27 Mi/2 



( ' )Z > o N 



1 2 M+1 



(-s=) 



^ y (2' 92 M )V2 



>2 



N 



(J—)—l >2 N 



(3.13) 



(3.14) 



(3.15) 



/_L_x 2 (M+l)-(l/2)/o g2 M > 2 
V27T 



TV 



taking the logarithm of both sides, 



(3.16) 



-log 2 V2n + M + 1 - (l/2)log 2 M > N 



(3.17) 
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Figure 3-5: The information capacity (in bits) of a constant-weight, 35-bit word plotted as a 
function of weight, log 2 C(35, W). 

finally, substituting M = N + E and solving for E yields 



-log 2 \Fht + N + E + 1 - (l/2)log 2 (N + E) > N 



(3.18) 



E > (l/2)log 2 (N + E) + log 2 y/2^ - 1 



(3.19) 



E > {l/2)log 2 {N + E) + log 2 J-KJ2 



(3.20) 



Unfortunately, this expression is not a simple function of N. Hence, the minimum value of E 
can only be found through iterative solution. 

Using this expression, only three additional bits are required when rationing a 32-bit word 
to weight 17. This is depicted in Figure 3-5 which plots the information capacity of 35-bit, 
constant-weight words. Notice that 32 bits of information can be encoded as 35-bit words 
with weight always equal to 17. (Since C(35, 17) = C(35, 18), 32 bits of information can also 
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be encoded as 35-bit words with weight always equal to 18. It is true for all odd M that 
the same number of constant-weight words exist for the two integer weights nearest M/2. To 
conserve power the smaller weight should be implemented.) 

3.4 Implementations of Ration Coding 

Ration coding may be implemented in many ways. The implementation technologies fall in 
two categories: table look-up and algorithmic implementations. The remainder of this chapter 
contains examples of these techniques. This section is somewhat incomplete; the topic remains 
an active area of research. 

3.4.1 Table Look-Up 

Ration coding can be implemented in a very straightforward manner using look-up tables. 
Design simplicity is one of the chief advantages of this approach. Tough decisions must be 
made regarding the chip area, speed, and additional pins demanded by a particular application. 
These trade-offs will be examined in a moment. 

We can choose positive integers P, Q, R, S, and V such that the following holds: 

C(Q, R-V) + ...+ C(Q, R-l) + C(Q,R) + C(Q,R + 1) + ... + C(Q,R + S) > 2 P (3.21) 

This enables us to encode a P-bit word of unrestricted weight as a Q-bit word with weight 
from R — V to R + S. This, of course, is ration coding. We are rationing a P-bit word to 
weights R-V through R + S. 

Since look-up table area grows exponentially with the number of address bits, it would be 
impractical to build a single ration-coding table in cases like the B3011. A table with forty 
address bits would require a ridiculous amount of chip area (on the order of a square meter). 
In cases where the output-word width, N, is larger than about ten, it will be necessary to 
segment N into manageable size sections. 

Referring to equation 3.21 and to Figure 3-6, we can segment an iV-bit output word into 
N/P sections. Each P-bit section can be rationed to Q bits with weight from R — V to R+ S. 
In total our TV-bit words will be rationed to M-bit words, where M = (N/P)Q, with weights 
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Figure 3-6: Description of parameters for ration coding using table look-up. 
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Table 3.1: Presented are the trade-offs between noise generation, AWmax, output pins, M, 
and table area, At and Ar, for several implementations of ration coding using look-up tables 
(N = 64). 



p 
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N/P 


R 
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AW MAX 


M 


At 


A R 
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10 


7.11 


4 
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1 


15 
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9 


9.14 
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10.67 
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77 
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10.67 
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12.8 
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16 
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1536 
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16 
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80 
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4 


5 


16 


1 


1 


1 


32 


80 


1280 


2048 


3 


5 


21.33 


2 








1 


106 


840 


2016 


3 


4 


21.33 


1 


1 
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85 


672 


1008 



(R- V)(N/P) through (R+ S)(N/P). Table 3.1 contains the design characteristics resulting 
from a variety of P, Q, R, S, and V combinations for N — 64. 

Worst-case, transmitted- noise generation will be proportional to AWmax- Ideally S = 
V = 0, there is no weight variation, and transmitted noise is virtually eliminated. The noise 
generated when using ration coding under these circumstances should be equivalent to that 
generated when using differential signalling. The number of output pins required by ration 
coding is M. The number of output pins required by differential signalling is 2JV. In all cases 
M < 2N. Of course, the price paid for fewer output pins is chip area and encoding and 
decoding time. 

AWmax i M, At, and Ar were a little tricky to calculate. (At and Ar are proportional 
to the total look-up table area required by the transmitting and receiving chips, respectively. 
Actual area can be determined by multiplying these numbers by the area of a single bit cell.) 
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In cases where N is evenly divisible by P, these values were calculated as follows: 

AWmax = (S + V)(N/P) (3.22) 

M = (N/P)Q (3.23) 

A T = (2 P *Q)(N/P) (3.25) 

When (N/P) is not an integer, a decision must be made regarding the treatment of the 
remainder bit(s). This decision affects all four values. For instance, in the case where P — 7, 
it was decided to segment the 64 bits into 9 groups of 7 and pass the remaining bit without 
coding. (Since 9*7 = 63, one bit must bypass the tables.) Therefore, AWmax — 9(5 + V) + 1, 
M = 9Q + 1,A T = 9(2 7 * Q), and A R = 9(2 Q * 7). In the case where P - 5, it was decided to 
segment the 64 bits into 13 groups of 5 creating one dummy table-input bit. (Since 13*5 = 65, 
there is one extra table input.) Therefore, AW M ax = 13(5 + V), M = 13Q, A T = 13(2 5 *Q), 
and Ar = 13(2^ * 5). The table look-up scheme is less efficient when N/P is not an integer. 
While examining Table 3.1, one should keep in mind the following: 

• Look-up table area increases exponentially with the size of the sections, P. 

• The number of pins, M, increases with the number of sections, N/P. 

• Large look-up tables are slow, so speed is inversely proportional to P. 

• Speed is so critical that, despite the fact each of the N/P tables is identical, it is unlikely 
the tables could be multiplexed. 

3.4.2 Algorithmic Implementations 

Ration coding can be implemented using a recursive algorithm. Figure 3-7 contains a block- 
diagram showing two recursions of this algorithm. Starting with an JV-bit output word, 
segment the word into N/P sections of P bits. During the first recursion each P-bit section 
is processed separately as follows: 
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Figure 3-7: Ration coding implemented with a recursive algorithm. 
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• Calculate the weight, W\, of the section (0 < W\ < P). 

• Test whether W\ is less than or exactly (1/4)P away from half the total number of bits 
in the section (i.e., test (1/2)P- (1/4)P = (1/4)P < Wi < (1/2)P + (1/4)P = (3/4)P). 

• If the test returns false, signal 1.1/ = 0. This selects the inverted form of half the 
section's bits. (I is an initial for invert, 1 refers to the first of P sections, and / denotes 
that the signal is active low.) The inversion forces the section's weight into the given 
range. Also, include 1.1/ = in the section's bits to inform the receiver of the inversion. 

• If the test returns true, signal 1.1/ = 1. This selects the non-inverted form of half the 
section's bits. Include 1.1/ = 1 in the section's bits. 

After one recursion each section is (P + 1) bits long. Furthermore, we are guaranteed for 
each section (1/4)P < W\ < (3/4)P. A second recursion proceeds as follows: 

• Combine two (P + l)-bit sections to form one larger section. 

• Calculate the weight, W 2 , of the section. We are guaranteed (1/2)P < W 2 < (3/2)P, 
but we would like to restrict the weight further. 

• As before, test whether W 2 is less than or exactly (1/4)P away from half the total 
number of bits in the section (i.e., test (P + 1) - (1/4)P < W 2 < (P + 1) + (1/4)P). 

• If false, invert half and include l.(N/P) + 1/ = in the section's bits. 

• If true, just include I.(iV/P) + 1/ = 1 in the section's bits. 

After the second recursion each section is 2(P + 1) + 1 bits long with (P + 1) — (1/4)P < 
W2 < (P + 1) + (1/4)P. Continue recursing until there is only one section. When finished, 
the number of possible different weights will equal (1/2)P + 1. Table 3.2 contains the tests 
required for various values of P with N = 64. 

Implementing ration coding using this algorithm involves trade-offs similar to those dis- 
cussed for table look-up. When choosing P one must realize that while small sections lead 
to a more constant output weight they require more recursions and, hence, more time, chip 
area, and additional output pins. Table 3.3 shows these trade-offs for N = 64. 
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Table 3.2: Tests required by recursive algorithm for various values of P with N = 64. 



Recursion 


P = 64 


P = 32 


P= 16 


1 


16 < W l < 48 


8 < Wi < 24 
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25 < W 2 < 41 
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3 


- 


- 
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4 
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- 


- 


5 


- 
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6 


- 


- 


- 
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2 


7 < W 2 < 11 
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W 2 = 3 


3 


17 < W 3 < 21 


10 < W 3 < 12 


W 3 = 7 


4 


37 < W A < 41 


22 < W 4 < 24 


W 4 = 15 


5 


- 


46 < W 5 < 48 


W 5 = 31 


6 


- 


- 


W 6 = 63 



Table 3.3: Trade-offs to consider when using recursive algorithm to implement ration coding. 
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encoded-bit=l 
I. (lst)/=0 



I. (2nd)/=l 
I. (3rd)/=0 




decoded-bit=l 



Figure 3-8: Decoding a bit which was encoded using the algorithm with three steps of recur- 
sion. (Note: I.(Kth)/ — means an inversion took place during the Kth recursion.) 

One distinct advantage of this implementation of ration coding is that it requires simple 
decode circuitry. Table look-up requires a set of decode tables. These tables consume more 
time and chip area than the encode tables. The recursive algorithm only requires circuitry 
to invert each encoded bit according to its set of invert signals. (In the case of K recursions, 
each bit has a set of K invert signals, I.(lst)/, l.(2nd)/, ..., I.(Kth)/.) The decode can be 
done in a time proportional to log2K with a tree of XNOR gates (see Figure 3-8). 

However, the recursive algorithm is not as efficient as table look-up; a constant output 
weight cannot be achieved even if the maximum number of recursion steps is implemented. 
Referring again to Table 3.3, the maximum number of recursion steps for N = 64 is six. 
Even with this many steps, the output can still differ in weight from cycle to cycle. (In this 
case the output weight can be either sixty-three or sixty-four.) Constant output weight can 
be achieved through several different implementations of ration coding using table look-up. 
(Refer to Table 3.1 again if necessary.) 

Another disadvantage of the ration-coding, recursive algorithm is that it requires count- 
ing and comparing; these operations are slow. Since the operations need not be precise for 
this application- a fast, approximate count-and-compare circuit can be built to mitigate the 
disadvantage. Figure 3-9 depicts an example of such a circuit. 

Implementing the algorithm with one step of recursion for a 32-bit output bus requires 
testing 8 < W\ < 24. Figure 3-9 shows schematically a fast circuit which performs part of 
this test. This circuit is fast because the count-and-compare operation is done in an analog 
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Figure 3-9: Fast, analog, count-and-compare circuit for implementing recursive algorithm 
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manner. The digital output of the comparator has the same logic state as the majority of the 
signals tied to its non-inverting input. (The voltage at each comparator input is the analog 
average of the forty-nine buffer /inverter digital outputs tied to it through the R resistors.) 
Seventeen additional buffer/inverter devices are used since the majority function is not desired 
here. They shift the switching point of the comparator to the desired location. Holding all 
seventeen at one logic level enables the comparator to differentiate between twenty-four high, 
of the remaining thirty-two buffer/inverter outputs, and twenty-five high. This, of course, 
permits a desired weight test (i.e., W\ > 24) on the 32-bit output word (A.l - A.S2). A 
second test, W\ < 8, can be similarly performed using another comparator and the inverting 
outputs of another set of buffer/inverter devices. NORing these results together produces the 
desired 8 < W\ < 24 signal. 

A trade-off is involved when choosing the resistor values. Rj must be close to the charac- 
teristic impedance of the transmission line. This value should not be so small that it demands 
too much current from the buffer /inverter devices. The circuit will work for a range of R val- 
ues, but keep it in mind that large R leads to a slow circuit while small R puts more current 
demand on the buffer /inverter devices. 

Other algorithmic implementations of ration coding have been considered. Unfortunately, 
all have been found to involve sophisticated computations (such as multiplication) and, hence, 
too much time. 

3.5 Ration Coding: Noise Reduction for All Ages 

Without changing its functionality, varying amounts of transmitted noise may be generated 
by a chip due to the signalling technology employed by a circuit. At one extreme of noise 
generation, there is single-ended signalling (i.e., one output signal for each information bit). 
This situation results in the maximum amount of transmitted noise. 

At the other extreme there is differential signalling. This is the brute-force approach to 
reducing the output-driver current variability; transmit the inverse of the output signals also, 
doubling the number of signals and reducing current variability (and, hence, transmitted 
noise) to its limit. Ration coding can be customized to reduce noise anywhere in the range 
from single-ended signalling to differential signalling. (In fact, differential signalling is exactly 
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an implementation of ration coding where each bit is encoded as two signals with a constant 
weight of one.) 

The advantages of single-ended signalling are obvious. If a circuit can handle the noise 
its chips are creating, then it makes no sense to waste time (both design time and execution 
time), chip area, and output pins on a nonexistent problem. 

However, if a circuit cannot handle the noise its chips are creating, it does not make sense 
to blindly double the output signals and employ differential signalling. One should consider 
using the ration coding techniques discussed previously. Time, chip area, and output signals 
can be balanced through ration coding to suit any application. 

Common mode rejection is a virtue of differential signalling. Referring to Figure 3-10, an 
output signal, A.x, and its inverse, A.x/ , encounter the same disturbances as they traverse 
from transmitting chip to receiving chip. For this reason the noise component, Vjv, for each 
signal is the same. Because this component is effectively subtracted out by the receiving 
chip's differential amplifier, the two signals are compared as though the noise did not exist. 
(Comparing A.x with A.x/ is the same as comparing A.x + V N with A.x/ + V N .) Common 
mode rejection is fundamental to differential signalling. 

In many cases simple additional input circuitry can be added to a ration-coding imple- 
mentation for it to exhibit common mode rejection. One example is shown in Figure 3-11. 
Imagine this is a table look-up implementation where the output word has been expanded 
from N bits with unrestricted weight to M bits with constant weight M/2. The lower input 
to all the differential amplifiers is the same. Due to the resistors (labeled R) the potential 
at this node (Vavg) is the average of all the signal-node potentials. Since half of the signal 
nodes are at Vol + V N and half are at V h + V N (a constant weight of M/2 guarantees this) 
the potential of the common node is V A vg = 1/2{V l + Voh) + V N . This implementation 
of ration coding exhibits common mode rejection since the noise (VJv) has no effect on the 
comparison made by the differential amplifiers. (Comparing A.x with 1/2(Vol + Voh) is the 
same as comparing A.x + Vn with 1/2(Vql + Voh) + Vn.) 
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Figure 3-10: Differential signalling exhibiting common mode rejection. 
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Figure 3-11: A ration-coding implementation exhibiting common mode rejection. 
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3.6 Demonstration of Ration Coding Reducing Noise 

I built a digital circuit to demonstrate ration coding reducing transmitted noise. A block 
diagram of this circuit appears as Figure 3-12. This circuit does not directly prove ration 
coding to reduce the type of transmitted noise discussed throughout this document (i.e., the 
voltage drop between chip and power supply due to the parasitic inductance of pin and bonding 
wire). Time constraints forbid this experiment. The circuit does demonstrate ration coding 
reducing another form of transmitted noise. The two types of transmitted noise have similar 
causes. Hence, the noise reduction measurements are instructive here. The analogy between 
the two forms of transmitted noise will be discussed after the operation of the circuit has been 
presented. 

The circuit consists of two component-carrying printed circuit boards: a board transmit- 
ting signals (PCB1) and a board receiving signals (PCB2). The experiment proceeds by, first, 
transmitting information with uncoded signals from PCB1 to PCB2 and measuring the noise 
generated. Next, use ration coding to encode the signals, transmit the same information, and 
measure the noise again. Comparison of the two measurements should show ration coding 
to have reduced the generation of noise. When applicable, a comparison with differential 
signalling should also be made. 

PCB1 contains TTL and ECL devices and operates as follows (see Figure 3-12): The 
counter addresses 256 consecutive bytes stored in the EPROM repetitively. The data bytes 
are sequentially loaded into octal flip-flops 1.1 through 1.8; the one-of-eight selection is made 
by the three LSB's of the counter. Every eight clock cycles the data in octal flip-flops 1.1 
through 1.8 is dumped into octal flip-flops 2.1 through 2.8. At this time sixty-four bits of 
data are transmitted over ribbon cable to PCB2. The ribbon cable is made of 128 wires. 
Every other wire in the cable is grounded so as to help eliminate the crosstalk which could 
potentially confuse results [BI088]. PCB2 contains only resistors since its job is simply to 
parallel-terminate the sixty-four received signals. 

The noise measured in the experiments was the difference in potential between the ground 
plane of PCB1 and that of PCB2. Ideally, both ground planes would be at zero volts resulting 
in no potential difference between the two. This voltage drop is analogous to the drop across 
the pin and bonding wire of a chip. In both cases the difference in potential occurs due to the 
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Figure 3-12: A block diagram of the ration- coding demonstration board. 
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parasitic inductance of wire. 

Current flows from PCB1 to PCB2 over the signal wires of the ribbon cable. Current 
must flow in a continuous loop. Therefore, the current returns to PCB1 over the ground wires 
of the ribbon cable. A large change in current over these ground wires will cause a voltage 
drop due to V = L(dl/dt). A large change in current results when successive output words 
have significantly different number of ones. Therefore, keeping the weight of successive output 
words relatively constant through ration coding should minimize the current changes. This, 
in turn, should minimize the potential difference between the ground planes (i.e., the noise 
measured in the experiment). Therefore, if noise reduction can be achieved through ration 
coding for my demonstration board, then it could also be achieved for integrated circuits. 

Prior to making any noise measurements, the EPROM must be programmed. Three 
EPROM data banks (of 256 bytes each) were programmed for each experiment: one with 
uncoded data, one with ration-coded data, and one with differential-signalling data. The 
programming was done so that in all three cases the output of PCB1 toggled between two 
64-bit words. This simplified result interpretation since one could examine the noise generated 
by a known change in weight. Of course, in each case the same amount of information was 
transmitted from PCB1 to PCB2. 

To make matters clear, let's look closely at the specific experiment where noise generated 
by worst case transitions was investigated. Figure 3-13 shows the pairs of output words PCB1 
toggled between during each of the three parts of the experiment. Figures 3-14 through 3-16 
include photographs of the resulting noise measured with an oscilloscope. 

In all experiments the first EPROM bank was programmed with uncoded data. For the 
specific, worst-case experiment the bank was programmed so that the output of PCB1 would 
toggle between a 64-bit word with weight zero and one with weight thirty-two. (Only thirty- 
two bits of information could be transmitted if a comparison with differential signalling was 
to be made. Recall, differential signalling requires two signals for each bit of information.) 
The noise measured during a transition from a word with weight zero to one with weight 
thirty- two is shown in Figure 3-14. 

The second EPROM bank was always programmed with the data resulting from a pseudo- 
implementation of ration coding. (I say pseudo-implementation since the coding was done on 
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Figure 3-13: The worst-case transition experiment requires PCB1 to toggle between three 
different pairs of output words (shown in hex). 




Figure 3-14: Noise generated during a transition from a word with weight zero to a word 
with weight thirty-two. The top oscilloscope trace shows one of the thirty-two output signals 
making a transition. The bottom trace shows the potential difference between the ground 
plane of PCB1 and that of PCB2. 
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Figure 3-16: Noise generated when implementing differential signalling. The top oscilloscope 
trace shows one of the sixty-four output signals making a transition. The bottom trace shows 
the potential difference between the ground plane of PCB1 and that of PCB2. 

of recursion and table look-up with various segment sizes). The resulting noise measurements 
uncovered no surprises. That is, in all experiments the noise measured was proportional to 
the weight variation of output words. Therefore, in any application the most extensive use 
of ration coding affordable is advised. Interested readers may consult the appendix for the 
results of the other experiments. 
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Chapter 4 

Reducing Transmitted Noise Using 
Starvation Coding 



This chapter presents a coding strategy for reducing transmitted noise in circuits exhibit- 
ing negligible steady-state current in their interconnection network. The conclusions reached 
apply to all such circuits. Because of its popularity, this document will often refer to a partic- 
ular technology, CMOS chips communicating with unterminated transmission lines, though 
the results are more generally applicable. 

4.1 An Example of Transmitted Noise in a CMOS Circuit 

On a write operation the Intel i860 microprocessor can drive sixty-four data lines and thirty- 
two address lines simultaneously [Int89]. Because as many as ninety-six output signals can 
change at once, precautions must be taken to prevent the i860 from generating an intolerable 
amount of transmitted noise. First, to better understand the problem, let's look closely at a 
write operation as if no precautions had been taken. Next, we will examine Intel's solution to 
the problem. Finally, we will evaluate a coding solution. 

Consider a worst-case write operation; ninety-five output signals are high and are changing 
to low (H->L) and one output signal is low and is staying low (L->L). The output circuitry of 
the i860 is shown schematically in Figure 4-1. For now, we will assume the i860 has one Vcc 
(or 5V) pin and one Vss (or GND) pin, each exhibiting some parasitic inductance (labeled 
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Figure 4-1: 96-bit parallel output circuit of the i860 driving unterminated transmission lines. 
Figure assumes one ground pin (Vss) and one power pin(Vcc)- 
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L\ and L2, respectively). Furthermore, we will assume each output is loaded with 50pF of 
interconnect and input capacitance (Cload)- Intel specifies the slew time of the i860 outputs 
(i.e., transition time between Vout - O.W and Vout = 2.0V) as a function of the load 
capacitance. It is not unreasonable to want the slew time (Ts) to be 1.0ns for a capacitive 
load of 50pF [TK85]. (This is not the slew time Intel specifies. Slowing this edge time is one 
of the ways Intel reduces noise. We will proceed here with a desirable and achievable slew 
time of 1.0ns.) These numbers mean that each output can discharge its load capacitance from 
2.0F to 0.8F in 1.0ns. The resulting current through each of the ninety-five NFETs (labeled 
I n for n = 1 to 95) can be approximated as follows: 

In = CW^P) (4-1) 



In * C L OAD{ ™ ) ( 4 - 2 ) 



I n « 50pF( 2 '°V» °' 8F ) = 60.0mA (4.3) 

1.0ns 

The sole Vss P m sinks all ninety-five output currents. That is, 

Iss = Ii+l2 + h + - + hs = 95(60.0mA) = 5.70A (4.4) 

The changing current on the Vss P m causes a voltage drop due to the parasitic inductance 
i/2- Assume the pin and bonding wire measure 1.0cm. As before, this yields an inductance of 
10.0n#. Using these numbers, the voltage drop across L2 can be approximated as follows: 

V N OISE2 = ^(^f ) ( 4 - 5 ) 



VNOISE2 ~ L 2 (^) (4.6) 

J-s 



c; 704 
Vnoisei « 10.0nH(— — ) = 57.0F (4.7) 

1.0ns 
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According to this argument, the voltage at the static output (VouT9e of Figure 4-1) should 
get as high as 57.0F. Of course, this is absurd since none of the capacitive loads would be 
discharging if this were the case. Nonetheless, the noise is obviously large enough to warrant 
attention. 

Intel takes this problem seriously. The i860 has twenty-four Vqc pins and twenty-four Vss 
pins [Int89]. Depending on how many of these pins are used by the output drivers, this should 
reduce the transmitted noise by as much as a factor of twenty-four. Notice, however, in the 
worst case this is not enough. 

The output bandwidth of the i860 suffers severely as a result of Intel's other attempt to 
reduce transmitted noise; the output drivers are intentionally designed to have slow transition 
times. Intel specifies the output slew time of the i860 to be about 6.0ns with a 50pF load 
capacitance [Int89]. The output drivers are substantially slower than they could be; above 
we assumed reasonably a slew time of 1.0ns. This reduces noise by as much as a factor of 
thirty-six (notice Ts enters into the above analysis twice) but costs a factor of six in output 
bandwidth. 

Unfortunately, there is no coding scheme which alone can solve this noise problem. For 
circuits in this class (i.e., circuits exhibiting negligible steady-state current) it is impractical 
to use coding to reduce noise by a factor of four or more. This fact will be demonstrated 
shortly. However, it still makes sense to consider coding techniques as a partial solution to 
problems like above or as a total solution to less severe problems. 

4.2 Starvation Coding as a Solution 

A type of noise-reduction coding, termed starvation coding earlier, can reduce noise in circuits 
like the one described in the previous section. From equation 4.5 it is apparent that no noise 
will be generated if the current on the Vss pin does not change. Unfortunately, this requires 
the output signals not to change. Of course, static signals convey no information. Therefore, 
for circuits exhibiting negligible steady-state current in their interconnection network, only 
limited noise reduction can be achieved through coding. 

Starvation coding cannot eliminate noise, but it can reduce it. Referring to equation 4.5 
again, less noise will be generated if the current on the Vss pin changes less drastically. This 
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Figure 4-2: A circuit for implementing transition signalling. 

requires fewer transitions by the output signals. Therefore, a coding strategy can reduce the 
transmitted noise generated by a set of outputs if it can reduce the maximum number of 
transitions experienced by the set while preserving its information-carrying capacity. 

Reducing transitions is a complex problem; it requires memory. Before transmitting a new 
set of outputs, the set must be compared with the previous set transmitted. The number of 
transitions must be counted prior to taking any coding actions (e.g., inverting some of the 
bits). It would be nice to avoid counting since it is slow. This can be achieved through a 
technique called transition signalling. 

A circuit for implementing transition signalling is shown schematically in Figure 4-2. This 
circuit conceptually simplifies the problem of reducing transitions to the problem of reducing 
l's. Notice a bit of TRANSMIT.WORD changes state only if a 1 appears in the corresponding 
bit position of OUTPUT-WORD. Since transitions are encoded as l's, starvation coding 
techniques can minimize the l's in OUTPUT_WORD and, hence, reduce transmitted noise. 
(For this scheme to work the transmitting and receiving registers must always contain the same 
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word. Equality is established prior to transmitting any data by CLEARing the registers. I 
encourage readers to convince themselves that OUTPUT-WORD and INPUT.WORD are 
always the same.) 

Power conservation is a desirable side effect of starvation coding. Since fewer output 
transitions occur, the capacitive loads need to be charged less frequently. This, of course, 
results in a power savings. Indulgence coding (i.e., coding so as to maximize the number of 
l's in a word) is the the logical inverse of starvation coding; It can reduce output transitions 
and, hence, transmitted noise as effectively as starvation coding. 

4.3 Theoretic Limits of Starvation Coding 

As mentioned earlier, starvation coding cannot eliminate noise. In fact, reducing transmitted 
noise by a factor of four or more appears impractical. A rigorous mathematical proof of this 
statement is complex. I will demonstrate this fact with two examples. First, I will examine 
the maximum noise reduction achievable. Then, I will examine noise reduction by a factor of 
four. 

Since static signals convey no information, transmitted noise is at a minimum when a chip's 
outputs experience one transition per cycle. This can be achieved with transition signalling 
and a starvation-coding scheme which limits words to a weight of one. Refering to Figure 4-2 
again, this corresponds to having a starvation-coding box with 2 outputs; there must be one 
output for each possible iV-bit word. Of course, it is impractical to exponentially increase the 
number of outputs to reduce transmitted noise. 

Unfortunately, reducing transmitted noise by as little as a factor of four appears to be 
impractical as well. Starving an 8-bit output word to weight two reduces transmitted noise 
by a factor of four. Unfortunately, this requires increasing the output word to twenty-three 
bits. This is demonstrated by the following analysis. 

Starving an TV-bit output word to weight W requires expanding the word to M bits. In 
order for M-bit words of maximum weight W to have the information capacity to replace 
JV-bit words of unrestricted weight the following must hold: 

C(M, 0) + C(M, 1) + C(M, 2) + ... + C(M, W) > 2 N (4.8) 
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Setting JV = 8 and W = 2 and solving for M yields, 

C(M, 0) + C(M, 1) + C(Af , 2) > 2 8 = 256 (4.9) 



M! Ml M\ 

+ i tf ,/-n. + oi,i., om * 256 ( 4 - 10 ) 



0!(M-0)! 1!(M-1)! 2!(M - 2)! 

1 + M + (1/2)M(M-1)>256 (4.11) 

(l/2)Af 2 + (l/2)M-255>0 (4.12) 

The quadratic formula gives the following conditions for M : 

(M > 22.1) V (M < -23.1) (4.13) 

Of course, M must be positive. Therefore, if the output weight is restricted to two, then at 
least twenty-three signals are required to transmit eight bits of information. 

This is no surprise when one considers the information capacity of a 23-bit word. Figure 4- 
3 plots the number of different 23-bit words as a function of weight (i.e., C(2S,W)). Notice 
that much fewer low-weight words exist than average-weight words. Figure 4-4 contains the 
same plot expanded in the area of interest. I have shaded the region corresponding to words of 
maximum weight two. Starvation coding uses less than 0.004% of the total number of 23-bit 
words. 

Table 4.1 contains the output signals required (M) to transmit various amounts of infor- 
mation (N, in bits) when reducing transmitted noise by a factor of four. The required number 
of output signals for large N appears tolerable. (For N = 32 it is less than twice N.) We will 
see in the next section, however, that no implementation technology we are aware of exists to 
make these cases practical. (Of course, one may be invented.) 

For circuits using series-terminated or un-terminated transmission lines, coding to reduce 
transmitted noise by a factor of four or more is fundamentally inefficient. This should be 



48 



Total Words 

6 

1.5 10 

1.2 10 
900000.-- 

600000.-- 

300000.-- 



M=23 




Weight 



Figure 4-3: The number of different 23-bit words plotted as a function of weight, C(23, W). 
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Figure 4-4: A plot of C'(23,W) expanded to show detail near W = 2. 
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Table 4.1: Output signals required (M) for various amounts of information (N, in bits) when 
reducing transmitted noise by a factor of four. 



JV 


Wmax 


M 


4 


1 


15 


8 


2 


23 


16 


4 


36 


32 


8 


63 



obvious from Figures 4-3 and 4-4; simply not enough low- weight words exist. However, this 
should not discourage the reader from using coding in general. The next section contains 
examples of starvation coding practically implemented to reduce transmitted noise by less 
than a factor of four. 

4.4 Implementations of Starvation Coding 

As with ration coding, starvation coding can be implemented in many ways. The implementa- 
tion technologies fall in two categories: algorithmic implementations and table look-up. The 
remainder of this chapter contains examples of these techniques. The section on algorithmic 
implementations is, unfortunately, quite incomplete. 

4.4.1 Algorithmic Implementations 

Starvation coding can achieve a 50% noise reduction with a simple algorithm. Figure 4-5 
shows schematically a circuit implementing this algorithm. This circuit reduces the worst- 
case number of transitions by 50%. It does not require the transition-signalling circuit of 
Figure 4-2. The scheme is very straightforward: 

• Store the last transmitted word in a register. 

• Compare this with the next word to be transmitted. 

• If the words differ in more than half the bit positions, bit-wise invert the entire word 
prior to transmitting it. 
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Figure 4-5: An algorithmic implementation of starvation coding which reduces transmitted 
noise by 50%. 
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Table 4.2: The transitions saved on the average using the starvation-coding algorithm with 
various segment sizes. 



N/K 


Tavg 


Tavg,starve 


Savings 


4 


2 


1.56 


21.9% 


8 


4 


3.27 


18.3% 


16 


8 


6.83 


14.6% 


32 


16 


14.2 


11.3% 



• Transmit an additional signal (INVERT) to inform the receiver whether or not the 
inversion took place. 

This scheme is attractive since it reduces the worst-case number of transitions by a factor 
of two with the addition of only one signal. Furthermore, the receiving chip requires only 
simple circuitry to interpret the transmitted word. Unfortunately, one cannot recurse on the 
idea to achieve greater reductions with the addition of more signals. 

An additional disadvantage of this scheme is it requires counting and comparing; these 
operations are slow. Since in this application the operations need not be accurate- fast, 
approximate count-and-compare circuitry can be built to mitigate the disadvantage. This 
idea was discussed previously with reference to a ration-coding implementation. An analog 
count-and-compare circuit could be built in a style similar to that of the circuit shown in 
Figure 3-9. 

An issue to consider with this scheme is whether or not to segment the output word and 
compare cycle-to-cycle words in sections rather than in entirety. For an N-bit output bus, 
building K circuits like the one shown in Figure 4-5 where each processes a single (N/K)-bit 
section has advantages. This would require more chip area and the transmission of more signals 
(K INVERT signals, one for each section) but would simplify the necessary count-and-compare 
circuitry since only N/K bits would need to be counted and compared. Furthermore, this 
would reduce the average number of transitions per cycle and, hence, the power consumption. 

Table 4.2 contains data describing the transitions saved on the average using the starvation- 
coding algorithm with various segment sizes. An (JV/A')-bit uncoded section experiences, on 
the average, T^vg = N/(2K) transitions per cycle. The starvation-coding algorithm reduces 
the average number of transitions to Tavg,STARVE- Notice Savings increases as the section 
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size decreases. This translates directly into a power savings. Since it is common for half or 
more of the dissipated power in a high speed CMOS system to be devoted to driving a highly 
capacitive output bus [TK85] this savings is significant. However, since the worst case remains 
at N/(2K) transitions, segmenting the output does not further alleviate the transmitted noise 
problem. 

4.4.2 Table Look-Up 

Starvation coding can be implemented with look-up tables. We can choose positive integers 
P, Q, and R such that the following holds: 

C(Q,0) + C(Q, 1) + C(Q, 2) + ... + C(Q,R) > 2 P (4.14) 

This enables us to encode a P-bit word of unrestricted weight as a Q-bit word with maximum 
weight R. This, of course, is starvation coding. We are starving a P-bit word to weight R. 

In cases where the output word is greater than about ten, it will be necessary to segment 
it into manageably sized sections. Using this idea we can segment our iV-bit output word into 
(N/P) sections. Each P-bit section can be starved to Q bits of maximum weight R by its 
own look-up table. In total N/P look-up tables can starve the JV-bit output word to M bits 
with maximum weight Wmax, where M = (N/P)Q and Wmax = (N/P)R. 

Choosing the values of P, Q, and R involves many trade-offs. Table 4.3 shows possible 
combinations of these parameters for N — 64 as well as their influence on various design 
characteristics including maximum output weight (Wmax), total pins (M), table area for 
the transmitting chip (At), and table area for the receiving chip (Ar). Using the transition- 
signalling circuit of Figure 4-2, a look-up scheme of Table 4.2 will reduce the worst-case noise 
by a factor of 64/Wmax- Ax and Ar were calculated as discussed in section 3.4.1 with the 
following formulas: 

A T = (2 P * Q)(N/P) (4.15) 

Ar = (2« * P)(N/P) (4.16) 
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Table 4.3: Combinations of P, Q, and R with their influence on various design characteristics. 



p 


Q 


N/P 


R 


Wmax 


M 


At 


Ar 


Savings 


10 


19 


6.4 


3 


22 


118 


116,736 


3.15 x 10 7 


44.1% 


10 


13 


6.4 


4 


28 


82 


79,872 


491,520 


29.5% 


10 


11 


6.4 


5 


34 


70 


67,684 


122,880 


17.1% 


9 


32 


7.11 


2 


15 


225 


114,688 


2.71 x 10 11 


57.0% 


9 


15 


7.11 


3 


22 


106 


53,760 


2.06 x 10 6 


39.3% 


9 


11 


7.11 


4 


29 


78 


39,424 


129,024 


24.7% 


8 


23 


8 


2 


16 


184 


47,104 


5.37 x 10 8 


52.4% 


8 


12 


8 


3 


24 


96 


24,576 


262,144 


34.1% 


8 


9 


8 


4 


32 


72 


18,432 


32,768 


18.3% 


7 


16 


9.14 


2 


19 


145 


18,432 


4.13 x 10 6 


46.9% 


7 


9 


9.14 


3 


28 


82 


10,368 


32,256 


27.0% 


6 


11 


10.67 


2 


22 


121 


7744 


135,168 


40.1% 


6 


7 


10.67 


3 


33 


77 


4928 


8448 


19.3% 


5 


8 


12.8 


2 


26 


104 


3328 


16,640 


32.5% 


4 


15 


16 


1 


16 


240 


3840 


2.10 x 10 6 


53.1% 


4 


5 


16 


2 


32 


80 


1280 


2048 


21.9% 


3 


7 


21.33 


1 


22 


148 


1176 


8064 


41.7% 


2 


3 


32 


1 


32 


96 


384 


512 


25.0% 



Table 4.3 contains combinations of P, Q, and R where P is no larger than ten and transmitted 
noise is reduced by about a factor of two or more. Some combinations are obviously impractical 
because they require too many output signals or too much chip area. The last column of the 
table indicates the power savings I expect from a particular implementation. The entries 
are actually equal to the savings in average number of output transitions, but, as argued 
previously, the two are proportional. Savings was calculated as follows: 



Savings = 1 



W AVG 
(P/2) 



(4.17) 



W A vg is the average weight of the section and was calculated as follows: 



W, 



AVG 



(O)C(Q, 0) + (l)C(Q, 1) + (2)C(Q, 2) + ... + (R- 1)C(Q, R-l) 



2 p 



+ 



(R)(2 P - [C(Q,Q) + C(Q, 1) + C(Q, 2) + ... + C(Q,R- 1)]) 

2 p 



(4.18) 
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Chapter 5 

Conclusions and Future 
Investigations 



This document developed a weight-based, noise-reduction coding scheme termed ration coding 
which reduces transmitted noise in circuits exhibiting steady-state current in their interchip 
signalling networks such as ECL circuits with parallel- terminated transmission lines. Ration 
coding achieves the noise reduction of differential signalling at smaller costs. Differential 
signalling requires doubling output pins and wires. This is a big price to pay especially 
in state-of-the-art digital systems; accommodating the wires is a primary concern in large, 
multiprocessor computers. 

Ration coding permits trading output wires for design time, execution time, and chip area. 
Design time is a fixed cost associated with all new ideas. In a pipeline environment the cost 
of adding an extra stage or two to hide the execution time should not be too severe. It would 
often make sense to trade chip area for output pins even if extra chip area was not readily 
available. Therefore, in many cases replacing differential signalling with an implementation 
of ration coding provides an increase in benefits. 

This document also developed a weight-based, noise-reduction coding scheme termed star- 
vation coding which reduces transmitted noise in digital circuits exhibiting negligible steady- 
state current in their interchip signalling networks such as CMOS circuits with un-terminated 
transmission lines. Starvation coding can reduce transmitted noise by as much as a factor of 



56 



three. A fundamental limitation makes greater reductions impractical. Nonetheless, starva- 
tion coding is a valuable addition to the noise-reduction arsenal. 

The development of new ration-coding and starvation-coding techniques will greatly en- 
hance the use of coding to reduce transmitted noise. Unfortunately, no coding research effort 
concerning the regulation of binary weight has produced a scheme applicable here. At this 
time the most applicable strategy appears to be table look-up. Look-up tables require small 
design and execution times but significant chip area. 

Those interested in furthering these investigations should consider two interesting possi- 
bilities. The codes described here have advantages not only at the chip boundary but also 
at the card/backpanel boundary. In particular, if a memory system is built to handle coded 
data, it might be easier to store data in coded rather than decoded form. Combining ration 
or starvation coding with coding for error detection/correction might make an attractive code 
for storing in a memory system. The combined codes, if they can be developed, might be able 
to combine both features into a smaller codeword than would be required by simply summing 
the extra code bits used [TK85]. 

The second idea is somewhat of a fantasy. Since eight bits of data can be encoded as 
eleven bits with weight four, it is conceivable to build an entire ECL computer with virtually 
no transmitted noise. In this computer each byte would be expanded to eleven bits with 
weight four. If it were possible to build modules that operated on 11-bit words of weight four 
and produced 11-bit words of weight four, then an entire ECL computer could be built where 
the number of l's in the data paths was always constant. This would lead to virtually no 
transmitted noise. This idea may seem a little far-fetched since it would be difficult to build 
a module that, say, added two 11- bit words of weight four and produced an 11-bit result of 
weight four, but it is conceivable nonetheless. 
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Appendix A 

More Demonstrations of Ration 
Coding Reducing Noise 



The appendix contains the experimental results promised in section 3.6. Each picture shows 
the noise generated when transmiting thirty-two bits of information. For the pictures contain- 
ing two oscilloscope traces, the top trace shows a representative signal being transmitted from 
PCB1 and the bottom trace shows the noise. For the pictures containing three oscilloscope 
traces, the top trace shows a representative signal being transmitted from PCB1, the bottom 
trace shows a representative signal being received by PCB2, and the middle trace shows the 
noise. 
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Figure A-l: Noise generated by a transition from an output word with weight seven to a word 
with weight twenty-five. 




Figure A-2: A pseudo-implementation of the ration-coding algorithm with one step of recur- 
sion reducing the noise from Figure A-l. 
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Figure A-5: A pseudo-implementation of ration-coding with two steps of recursion reducing 
the worst-case noise from Figure A-4. 




Figure A-6: Worst-case noise generation with two steps of recursion. This occurs when an 
output word with weight twenty-one is transmitted after a word with weight thirteen. 
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Figure A- 7: Worst-case noise generation with three steps of recursion. This occurs when an 
output word with weight twenty-one is transmitted after a word with weight seventeen. 




Figure A-8: Noise generated by a transition from an output word with weight zero to a word 
with weight thirty-two. 
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Figure A-9: Table look-up implementation reducing worst-case noise from Figure A-8. Here, 
8-bit words are encoded as eleven bits with weight four. 




Figure A- 10: Table look-up implementation reducing worst-case noise from Figure A-8. Here, 
4-bit words are encoded as six bits with weight three. 
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Figure A- 11: Table look-up implementation reducing worst-case noise from Figure A-8. Here, 
2-bit words are encoded as four bits with weight one. 




Figure A- 12: Table look-up implementation reducing worst-case noise from Figure A-8. Here, 
6-bit words are encoded as eight bits with weight four. 
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Figure A- 13: Table look-up implementation reducing worst-case noise from Figure A-8. Here, 
3-bit words are encoded as five bits with weight two. 




Figure A- 14: Differential signalling reducing the worst-case noise from Figure A-8. 
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